Determining optimal parameters of the Self Referent Encoding Task: A large-scale examination of self-referent cognition and depression

This file is one of a series of supplemental explanatory documents for the study “Determining optimal parameters of the Self Referent Encoding Task: A large-scale examination of self-referent cognition and depression”. Data and code are located at doi: 10.18738/T8/XK5PXX, and websites with visual R Markdown explanations are located and navigable on the paper’s github pages website.

Data description

This file uses the models created in beset (refer to the file creating models), loads them, and then prints summary statistics, and plots. The plots are included in the paper, as are many of the summary statistics. Further info on those summary statistics and what can be gleaned from the models can be gleaned by reading the documentation for besethelp("summary.beset").

If you are viewing this as an HTML file, and wish to see the code, please download the R Markdown file from the Texas Data Repository.

library(beset); library(tidyverse)

load("utmodel-all.Rdata")
load("mtmodel-all.Rdata")
load("adomodel-all.Rdata")

load("model_summaries.Rdata")

Plots of the cross-validated cross-entropy errors

The “best model” was the one chosen with the fewest predictors, which still fell within one standard error of the absolute best model.

Plot of cross-entropy errors

In order to make these more comparable between models, we’ve standardized them based on the null model (with 0 predictors). Lower (standardized) cross-entropy error indicates a better model fit.

Plot of deviance (R2D)

This is comparable to R2, but explains deviance rather than variance. Higher R2D indicates more deviance explained and thus a better model.

Models

Note on the R-squared estimate: Deviance explained:

beset also identifies the R2D (deviance explained) for the models. To quote from the function’s explanations, it “calculates R-squared as the fraction of deviance explained, which … generalizes to exponential family regression models. [It] also returns a predictive R-squared for how well the model predicts responses for new observations and/or a cross-validated R-squared with a bootstrapped confidence interval.”

The model has also determined the estimates of the size of the negative binomial function, theta. These are printed along with the model summary.

College Students Sample model:


======================================================= 
Best Model:
  dep ~ num.neg.endorsed + v.positive + szr 

16 Nearly Equivalent Models:
  dep ~ num.neg.endorsed + zr.negative + v.positive
  dep ~ num.neg.endorsed + numSRnegrecalled + v.positive
  dep ~ num.neg.endorsed + numposrecalled + v.positive
  dep ~ num.neg.endorsed + v.negative + v.positive
  dep ~ num.neg.endorsed + zr.positive + v.positive
  ...
   + 11 more
  ...

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.8128  -0.7494  -0.0423   0.4748   2.2857  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       2.652969   0.074644  35.541  < 2e-16 ***
num.neg.endorsed  0.048017   0.005188   9.255  < 2e-16 ***
v.positive       -0.132866   0.017212  -7.719 1.17e-14 ***
szr              -0.464225   0.192406  -2.413   0.0158 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(6.1336) family taken to be 1)

Log-likelihood: -1437 on 5 Df
AIC: 2883.6

Number of Fisher Scoring iterations: 1

Train-sample R-squared = 0.45, Test-sample R-squared = 0.43
Cross-validated R-squared = 0.44, 95% CI [0.42, 0.46]
=======================================================

Mturkers Sample model:


======================================================= 
Best Model:
  dep ~ num.neg.endorsed + v.negative + v.positive + st0 

34 Nearly Equivalent Models:
  dep ~ num.pos.endorsed + zr.negative + v.negative + st0
  dep ~ zr.negative + v.negative + v.positive + st0
  dep ~ numSRnegrecalled + v.negative + v.positive + st0
  dep ~ num.pos.endorsed + numSRnegrecalled + v.negative + st0
  dep ~ num.pos.endorsed + num.neg.endorsed + v.negative + st0
  ...
   + 29 more
  ...

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.74733  -0.88922  -0.09719   0.41308   2.42986  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)       2.34900    0.20878  11.251  < 2e-16 ***
num.neg.endorsed  0.03422    0.01309   2.614  0.00896 ** 
v.negative        0.13464    0.06529   2.062  0.03918 *  
v.positive       -0.10619    0.04256  -2.495  0.01259 *  
st0               1.43876    0.56658   2.539  0.01110 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(2.337) family taken to be 1)

Log-likelihood: -746.9 on 6 Df
AIC: 1505.8

Number of Fisher Scoring iterations: 1

Train-sample R-squared = 0.43, Test-sample R-squared = 0.29
Cross-validated R-squared = 0.41, 95% CI [0.4, 0.43]
=======================================================

Adolescents Sample model:


======================================================= 
Best Model:
  dep ~ zr.negative + v.negative + v.positive + a 

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8491  -1.1893  -0.2880   0.4967   3.1040  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  1.71379    0.31652   5.415 6.15e-08 ***
zr.negative  1.97101    0.46791   4.212 2.53e-05 ***
v.negative   0.43193    0.06897   6.263 3.78e-10 ***
v.positive  -0.24173    0.06231  -3.879 0.000105 ***
a           -0.52008    0.16460  -3.160 0.001580 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(3.4068) family taken to be 1)

Log-likelihood: -384.7 on 6 Df
AIC: 781.39

Number of Fisher Scoring iterations: 1

Train-sample R-squared = 0.43, Test-sample R-squared = 0.41
Cross-validated R-squared = 0.4, 95% CI [0.36, 0.43]
=======================================================

Visualizing pairs of predictors

We also wrote functions (the code for which can be found in the full .Rmd document) to compare models for every possible pairing of the predictors. By plotting these similarly to a correlation table, we are able to note specific patterns in our models. We can then ask questions based on those patterns.

Thus, the following images plot, per-sample, the mean cross-entropy errors for every two-predictor model. (Note that these are not the best models within 1SE—however, those models are multidimensional and more difficult to visualize.) These plots, stitched into one figure, are included in the paper.

College Students

MTurk

Adolescents

Comparing specific models

Based on trends highlighted in the plots of the models, we ran small, specific comparisons within each sample. For example, we hypothesized that endorsements of positive and negative word alone were substantially better at predicting depression symptoms than so-called negative/positive processing biases (e.g., the ratio of the number of negative words endorsed to the total number of words endorsed). We tested these using cross-validated R2D calculated using the r2d() function from the beset package.

These comparisons can be seen in the file on comparing specific models.